How to handle SIGSEGV, but also generate a core dump
Recently I ran into this problem. How do you capture SIGSEGV with a signal handler and still generate a core file?
The problem is that once you have your own signal handler for SIGSEGV, Linux will not call default signal handler which generates the core file. So, once you got SIGSEGV, consider all that useful information about about origin of the exception, lost.
Luckily, there’s a solution. Here’s what I did.
You start with registering a signal handler. Once you get the signal, inside of the signal handler, set signal handler for the signal to SIG_DFL. Then send yourself same signal, using kill() system call. Here’s a short code snippet that demonstrates this little trick in action.
#include <stdio.h> #include <sys/types.h> #include <unistd.h> #include <signal.h> void sighandler(int signum) { printf("Process %d got signal %d\n", getpid(), signum); signal(signum, SIG_DFL); kill(getpid(), signum); } int main() { signal(SIGSEGV, sighandler); printf("Process %d waits for someone to send it SIGSEGV\n", getpid()); sleep(1000); return 0; }
Note that this code doesn’t actually cause a segmentation fault. To simulate segmentation fault, I did kill -11 <pid> from the command line. This is what happened.
$ ls sigs.c $ gcc sigs.c $ ./a.out Process 2149 waits for someone to send it SIGSEGV Process 2149 got signal 11 Segmentation fault (core dumped) $ ls a.out* core sigs.c
Obviously, without lines 9 and 10 in the code, there would not be core file.
By the way, you can use this technique to handle any core generating exception – SIGILL, SIGFPE, etc.
[…] case you still want to handle exception signals, read my How to handle SIGSEGV, but also generate a core dump […]
Thanks Alexander. Nice trick. I was exactly looking this..
@dam
Thanks! Please visit again
I use’ed this code in a shell I wrote in order to use ctrl – c & ctrl v on process the shell runs.
for some senarios its caused a segmentation fault.
for example – a process that was suspended (on suspended Q) and than movind it to forground (+ remove from Q) and then the handler ==>caused seg_foult ==> do you know why? its happen also when I didnt removed it from the Q but only moved it from suspended to FG
Thanks
Did you try this way?
http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_21.html#SEC353
Hi Alexander,
I am new to linux and trying to understand the signal .Your artcles are very intresting . I have tried the above program .But didn’t see the core dump file. This is what I have tried
pradeepk@ipglx29> ./a.out
Process 18123 waits for someone to send it SIGSEGV
Process 18123 got signal 11
Segmentation fault
ls -l
total 11
-rwxrwxr-x 1 pradeepk users 9265 2010-09-20 12:15 a.out*
-rw-rw-r– 1 pradeepk users 396 2010-09-20 12:15 sig.c
~/IPC [ NONE ]
From another terminal I have tried kill -11 pid
Thanks for the help.
Regards
Pradeep
Nice, works, thanks Alexander! (btw, without this, catching a (real, null-pointer-dereference) SIGSEGV results in calling the signal handler again and again, presumably because the control returns to the faulting address after the handler runs… so this thing seems to be quite useful.)
@pradeep:
core dumps aren’t created by default, you have to explicitly enable them, see for example this article:
http://aplawrence.com/Linux/limit_core_files.html
@pradeep and @latanius, it seems you guys have found each other.
Oh btw, I often use “ulimit -c unlimited” – that’s because I usually have no idea what will be the size of the core file.
Thanks to both of you for your notes and please come again
I see you are re-signallying the same signal. Since the original signal is not guaranteed to be one providing a core dump have you considered:
gcore(), abort(), or { char *cp = 0; *cp = ‘1’; }
all of which seem to be fairly short and highly portable. Watch out for the last one if you have a SIGBUS or SIGSEGV handlers installed. Also you may need to uset system() or fork/exec to run the gcore utility on linux since gcore does not seem to be standard on linux yet.
@Chris Eleveld
I didn’t try any of these. What will happen if you open core file that you generated with *cp = ‘1’; with gdb? I bet it will be the line in signal handler that generated the exception. If you do it the way I suggested, it will pin-point you to the place where exception took place and not place where you resend it from.
Why reconfiguring the signal handler? SIG_DFL is just a function pointer. You can call SIG_DFL also directly something like this:
…
void sighandler(int signum)
{
printf(“Process %d got signal %d\n”, getpid(), signum);
SIG_DFL(signum);
}
…
@Andreas
I believe you are right, but I’d still stick to the method I described. This is because type of SIG_DFL is implementation dependent. No-one promises that it is a pointer to a valid function.
You can call abort() from signal handler, it will also generate core dump, so no need in magic you described.
Using abort() in your signal handler will raise signal SIGABRT.
Yes, you get a core file…… of your signal handler, not the process that actually crashed.
how did u give it the signal 11. i execute the same way but never never got signal 11.
@Andreas –
yes SIG_DFL’s type is a function pointer, however, if you look at what its value is it is set to 0. So obviously you couldn’t call SIG_DFL(signum). A signal is actually captured first by the kernel and it can be forwarded on to a custom signal handler if the signal handler is non-zero. Alexander has it right to consider SIG_DFL implementation dependent and not assume it is a pointer to a valid function (which the current implementation does not have it going to a valid function).
@id
Doesn’t
kill -11
from command line works?If you setup a custom handler using sigaction(), and set sa_flags = SA_RESTART, then after reverting the handler to SIG_DFL, you won’t need to send yourself SIGSEGV as the second attempt to execute the offending instructions will cause a normal segfault.
Not sure how this is more useful, just a thought.
[…] creation of a core dump, or signal something that a core dump is about to occur using this trick: How to handle SIGSEGV, but also generate a core dump – Alex on Linux But don't call printf like in the example since that's not in the list. […]
Hi Alexander:
I am trying to debug a program that uses the exact same method of re triggering a SIGSEGV fault in the signal handler as you describe. However when trying to analyze the resulting core dump I cannot seem to get a useful backtrace to where the offending instruction occurred:
(gdb) bt
#0 0xb7b7d3b1 in kill () from /lib/libc.so.6
#1 0xb7f8dd2d in CEventHandler::defaultMachineSignalHandler (signo=11) at ../source/Event.cpp:369
#2 0xb7fc0420 in ?? ()
#3 0x0000000b in ?? ()
#4 0x00000033 in ?? ()
#5 0x00000000 in ?? ()
The stack contents between the initial SIGSEGV and the handler don’t allow me to see where the original fault occured. The program is embedded multi-threaded, uses shared objects and cross compiled from another system. The system it is running on is:
uname -a
Linux D400 2.6.18.2-ASAT #1 PREEMPT Wed Jul 21 11:40:47 MDT 2010 i686 i686 i386 GNU/Linux
Do you know of a way to unwind the stack before the core dump is generated, so that I can get info on the original fault?
Any help you could provide would be appreciated.
Hi Alex,
I confirm the trouble reported by RickS.
Your solution shows valid call stack for core generated on Ubuntu, but not on Red Hat Linux. In this case backtrace shows something unrelated to the real place which caused singnal to be sent.
So probably this is not universal method
Hi alexander,
I double confirm the trouble reported by RickS and Alex.
When I run multi threaded program, the core file seems to point to a place completely unrelated to where the seg fault has occurred. Is there any work around for this??
Say there are 3 thread m (main), t1, t2
if segv happened in t2, I am seeing the core file to point to some unrelated code in m. I think by the time you catch the signal, do cleanup and generate core file, the stack is getting changed..
@engineer
@Alex
@RickS
I think there is a workaround. Instead of sending kill to getpid(), send it to gettid(). See here:
http://www.alexonlinux.com/how-to-obtain-unique-thread-identifier-on-linux
In Linux, when you send a signal to a process, it will be handled by arbitrary thread. So when you kill getpid(), any threads in the process handles it. So, instead of sending kill to getpid(), send it to the sick thread and you will get right backtrace.
I’ll try to confirm that it works some time later.
You can also use raise(signum) – which is guaranteed to send the signal to the calling thread.
nice one.. was searching for something like this..
Why can’t we use SA_RESETHAND?
Why are we re-raising the signal after calling default handler. The default handler supposed to be generate core and terminate the application right?
Here:
http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_21.html#SEC353
There is no invocation of default handler, hence re-raised. But here I couldn’t get the reason.
hey, thanks Alexander!
it was helpful!!
i Am able to handle the Signal SEGV, but after that process goes for sleep
and if we send the request again then also it is not invoking.
Is there any solution in which the specific thread( which caused SEGV) can be terminated and all other thread keep on running.
Thanks..!
#include
#include
#include
#include
void sighandler(int signum)
{
printf(“Process %d got signal %d\n”, getpid(), signum);
signal(signum, SIG_DFL);
kill(getpid(), signum);
}
int main()
{
signal(SIGSEGV, sighandler);
printf(“Process %d waits for someone to send it SIGSEGV\n”,
getpid());
int *p = NULL;
printf(“++creating seg fault\n”);
*p = 10;
return 0;
}
Why this one doesn’t give the proper back trace? Any help is appreciated.
What did works for me is to reset the handler with signal(signum, SIG_DFL);
and let the handler return.
Vince
[…] How to handle SIGSEGV, but also generate a core dump – Alex on Linux […]
[…] How to handle SIGSEGV, but also generate a core dump – Alex on Linux […]
eye
How to handle SIGSEGV, but also generate a core dump – Alex on Linux